fix: optimise reading large file content#3767
Conversation
remove slop comment
|
You said the word "optimize" and you tagged me to review and this made me look at the surrounding code... so I did #3781 which is orthogonal to this but related. (The next optimization is to build up the Also all this clearly needs deduplication with the shell mcp |
* main: (34 commits) Token counting in Auto-compact uses provider metadata (#3788) docs: Add YouTube link to Git MCP Tutorial (#3831) feat: more robust client initialization for the app (#3830) Build app bundles on release branches always (#3789) fix param order of debug_conversation_fixer (#3796) Fix directory switcher not working in active chat sessions and file browser not defaulting to current session directory path (#3791) File completion in CLI (#3822) docs: Dynamic linux install buttons (#3810) tests: Add missing `#[serial]` to two tests (#3816) Chore: apply more clippy rules to prevent from code complexity (#3813) chore(mcp): Add helpers to parse parameters (#2821) feat: enable docusaurus respectPrefersColorScheme (#3746) fix session resume in new window (#3800) Add settings field documentation to recipe guides (#3809) chore(deps): bump on-headers and compression in /documentation (#3532) fix(ui): refresh provider related issues (#3385) feat: Add comprehensive Linux build support (#3673) developer: Optimize text_editor_view a bit (#3781) Override session name generator for ollama provider (#3710) docs: fix markdown for cognee tutorial (#3801) ...
|
ugh new clippy rules means this will have a lot of unrelated changes (but not functional changes) |
|
thanks @cgwalters @katzdave for feedback - it is a slightly larger appearing change as it simplifies the text_editor_view top level function (hard to see that in diffs, but if you see the code it looks simpler/neater) |
| } | ||
| }; | ||
| let lines: Vec<&str> = content.lines().collect(); | ||
| let total_lines = lines.len(); |
There was a problem hiding this comment.
could be paranoid here and check the max line length of lines
There was a problem hiding this comment.
yeah I wasn't sure how to break that down - as it is a problem (ie if you tell goose to look at your session jsonl files - will get heavy).
The issue then is - the view editor tool doesn't have a range that looks inside a line (so it would need to be enhanced) so either part of this change or a follow on to have it take a column range as well as row start and finish?
There was a problem hiding this comment.
will have a follow on task to make it read by column too when needed
| }; | ||
| let mut content = String::new(); | ||
| f.read_to_string(&mut content) | ||
| .map_err(|e| ToolError::ExecutionError(format!("Failed to read file: {}", e)))?; |
There was a problem hiding this comment.
@DOsinga this would return if it is a binary
* main: fix: replace glob/grep tool with shell (#3834) docs: Add Youtube Link to dev.to tutorial (#3869) Changed app settings configuration form to match settings panels (#3829) Tell the user to hit compact (#3851) Pin @mcp-ui/client in package.json (#3860) blog for mcp-jupyter server (#3059) docs: Adding dev.to Tutorial & Update CLI Component (#3828) Detect client disconnects and cancel tool calls (#3782) Suppress ansi with pipes (#3775)
…e-editable-displayable-title * upstream/main: (134 commits) fix: optimise reading large file content (block#3767) fix: replace glob/grep tool with shell (block#3834) docs: Add Youtube Link to dev.to tutorial (block#3869) Changed app settings configuration form to match settings panels (block#3829) Tell the user to hit compact (block#3851) Pin @mcp-ui/client in package.json (block#3860) blog for mcp-jupyter server (block#3059) docs: Adding dev.to Tutorial & Update CLI Component (block#3828) Detect client disconnects and cancel tool calls (block#3782) Suppress ansi with pipes (block#3775) Fix leaky env variable causing flaky test (block#3761) Update gemini error msg (block#3847) Generic retry and error parsing (block#3558) Clear the current line on ctrl-c in line with other tools (block#3764) chore: upgrade morph to use new model with instruction (block#3745) add CODEOWNERS file with /documentation owners (block#3840) Token counting in Auto-compact uses provider metadata (block#3788) docs: Add YouTube link to Git MCP Tutorial (block#3831) feat: more robust client initialization for the app (block#3830) Build app bundles on release branches always (block#3789) ...
* main: (33 commits) fix: optimise reading large file content (#3767) fix: replace glob/grep tool with shell (#3834) docs: Add Youtube Link to dev.to tutorial (#3869) Changed app settings configuration form to match settings panels (#3829) Tell the user to hit compact (#3851) Pin @mcp-ui/client in package.json (#3860) blog for mcp-jupyter server (#3059) docs: Adding dev.to Tutorial & Update CLI Component (#3828) Detect client disconnects and cancel tool calls (#3782) Suppress ansi with pipes (#3775) Fix leaky env variable causing flaky test (#3761) Update gemini error msg (#3847) Generic retry and error parsing (#3558) Clear the current line on ctrl-c in line with other tools (#3764) chore: upgrade morph to use new model with instruction (#3745) add CODEOWNERS file with /documentation owners (#3840) Token counting in Auto-compact uses provider metadata (#3788) docs: Add YouTube link to Git MCP Tutorial (#3831) feat: more robust client initialization for the app (#3830) Build app bundles on release branches always (#3789) ...
* 'main' of github.com:block/goose: Make the window title reflect what we are doing (#3883) additional metrics + Ui implementation (#3871) feat: Add session description editing functionality (#3819) Update filename in contributing docs (#3866) Fix voice dictation provider selection bug (#3862) doc: Update supported container runtimes (#3874) feat: add OAuth provider abstraction for CLI configuration (#3157) Don't ignore lockfiles on linux/windows builds (#3859) Use RMCP for StreamableHTTP OAuth support (#3845) Try to keep key order for Databricks (#3876) Fix OpenAI Provider with GitHub Models (#3875) Cmd click open finder (#3807) fix: recipe parameter form max height and not scrolling (#3879) fix: optimise reading large file content (#3767) fix: replace glob/grep tool with shell (#3834) docs: Add Youtube Link to dev.to tutorial (#3869)
* 'main' of github.com:block/goose: Make the window title reflect what we are doing (#3883) additional metrics + Ui implementation (#3871) feat: Add session description editing functionality (#3819) Update filename in contributing docs (#3866) Fix voice dictation provider selection bug (#3862) doc: Update supported container runtimes (#3874) feat: add OAuth provider abstraction for CLI configuration (#3157) Don't ignore lockfiles on linux/windows builds (#3859) Use RMCP for StreamableHTTP OAuth support (#3845) Try to keep key order for Databricks (#3876) Fix OpenAI Provider with GitHub Models (#3875) Cmd click open finder (#3807) fix: recipe parameter form max height and not scrolling (#3879) fix: optimise reading large file content (#3767) fix: replace glob/grep tool with shell (#3834) docs: Add Youtube Link to dev.to tutorial (#3869)
by default the text_editor_view tool will read any file (under 400KB) but this can be harmful even at smaller size in filling up the context at a high clip.
With this change it will encourage the agent to search the file and read in ranges IFF it is > 2000 lines long (below that works as of today). It works by letting the agent know on first call that it is a large file (if it gets no range) and that it should either search for content, or call back specifying the full range explicitly to load if it still wants to.
I picked "2000 lines" as a reasonable size (eg for a document or source code) on a human level that is large to cut off at.
This means that you can run a query like this:
which is a large file, and would normally significantly use up the context, but now is only sipping it:
Before this change it would be at least this usage:
and much worse for larger files (yet can now do the same thing)
cc @cgwalters for idea